Access public data using R in Oracle

Access public data using R in Oracle

Access data via Oracle (AFSC only)

AFSC Oracle users can access the database via SQL developer to view and pull the production data directly from the GAP_PRODUCTS Oracle schema.

Connect to Oracle from R

Many users will want to access the data from Oracle using R. The user will need to install the RODBC R package and ask OFIS (IT) connect R to Oracle. Then, use the following code in R to establish a connection from R to Oracle:

Here, the user can write in their username and password directly into the RODBC connect function. Never save usernames or passwords in scripts that may be intentionally or unintentionally shared with others. If no username and password is entered in the function, pop-ups will appear on the screen asking for the username and password.

Connect to Oracle from R

Access data via Oracle

If the user has access to the AFSC Oracle database, the user can use SQL developer to view and pull the GAP Products data directly from the GAP_PRODUCTS Oracle schema.

Connect to Oracle from R

Many users will want to access the data from Oracle using R. The user will need to install the RODBC R package and ask OFIS (IT) connect R to Oracle. Then, use the following code in R to establish a connection from R to Oracle:

Here, the user can establish the oracle connection by entering their username and password in the channel <- gapindex::oracle_connect() function. Never save usernames or passwords in scripts that may be intentionally or unintentionally shared with others. If no username and password is entered in the function, pop-ups will appear on the screen asking for the username and password.

Data SQL Query Examples:

Ex. 0: Select all data from a table

You can download all of the tables locally using a variation of the code below. Once connected, pull and save the tables of interest into the R environment.

locations <- c(
  "GAP_PRODUCTS.CPUE",
  "GAP_PRODUCTS.BIOMASS",
  "GAP_PRODUCTS.AGECOMP",
  "GAP_PRODUCTS.SIZECOMP",
  "GAP_PRODUCTS.STRATUM_GROUPS",
  "GAP_PRODUCTS.AREA_ID",
  "GAP_PRODUCTS.DESIGN_TABLE",
  "GAP_PRODUCTS.TAXONOMICS_WORMS", 
  "GAP_PRODUCTS.TAXONOMICS_ITIS", 
  "GAP_PRODUCTS.TAXONOMIC_CONFIDENCE", 
  "GAP_PRODUCTS.METADATA_COLUMN" 
)

for (i in 1:length(locations)) {
  print(locations[i])
  a <- RODBC::sqlQuery(channel, paste0("SELECT * FROM ", locations[i]))
  write.csv(x = a, file = here::here("data", paste0(locations[i], ".csv")))
}

Ex. 1: GOA Pacific Ocean perch biomass and abundance

Biomass and abundance for Pacific Ocean perch from 1990 – 2023 for the western/central/eastern GOA management areas as well as for the entire region.

dat <- RODBC::sqlQuery(channel = channel, 
                       query = 
"WITH FILTERED_STRATA AS (
SELECT AREA_ID, DESCRIPTION FROM GAP_PRODUCTS.AREA
WHERE TYPE in ('REGULATORY AREA', 'REGION') 
AND SURVEY_DEFINITION_ID = 47)
SELECT 
BIOMASS_MT,
POPULATION_COUNT, 
YEAR, 
DESCRIPTION
FROM GAP_PRODUCTS.BIOMASS BIOMASS
JOIN FILTERED_STRATA STRATA 
ON STRATA.AREA_ID = BIOMASS.AREA_ID
WHERE BIOMASS.SURVEY_DEFINITION_ID IN 47 
AND BIOMASS.SPECIES_CODE = 30060")
dat0 <- dat %>% 
  janitor::clean_names() %>% 
  dplyr::select(biomass_mt, population_count, year, area = description) %>%
  pivot_longer(cols = c("biomass_mt", "population_count"), 
               names_to = "var", 
               values_to = "val") %>% 
  dplyr::mutate(
    val = ifelse(var == "biomass_mt", val/1e6, val/1e9), 
    var = ifelse(var == "biomass_mt", "Biomass (Mmt)", "Population (B)"), 
    area = gsub(x = area, pattern = " - ", replacement = "\n"), 
    area = gsub(x = area, pattern = ": ", replacement = "\n"), 
    type = sapply(X = strsplit(x = area, split = "\n", fixed = TRUE), `[[`, 2))  %>% 
  dplyr::arrange(type) %>% 
  dplyr::mutate(
    area = factor(area, levels = unique(area), labels = unique(area), ordered = TRUE))

# install.packages("scales")
library(scales)
figure <- ggplot2::ggplot(
  dat = dat0, 
  mapping = aes(x = year, y = val, color = type)) +
  ggplot2::geom_point(size = 3) + 
  ggplot2::facet_grid(cols = vars(area), rows = vars(var), scales = "free_y") + 
  ggplot2::scale_x_continuous(name = "Year", n.breaks = 3) +
  ggplot2::scale_y_continuous(name = "Estimate", labels = comma) +
  ggplot2::labs(title = 'GOA Pacific Ocean perch biomass and abundance 1990 – 2023')  + 
  ggplot2::guides(color=guide_legend(title = "Region Type"))+
  ggplot2::scale_color_grey() +
  ggplot2::theme_bw() +
  ggplot2::theme(legend.direction = "horizontal", 
                 legend.position = "bottom")

flextable::flextable(head(dat))

BIOMASS_MT

POPULATION_COUNT

YEAR

DESCRIPTION

110,656.50

276,006,739

1,984

EASTERN GOA - INPFC

100,542.22

183,617,817

1,987

EASTERN GOA - INPFC

101,678.29

177,314,827

1,990

EASTERN GOA - INPFC

151,580.88

235,121,936

1,993

EASTERN GOA - INPFC

212,035.74

384,342,830

1,996

EASTERN GOA - INPFC

77,115.84

192,233,028

1,999

EASTERN GOA - INPFC

figure

Ex. 2: AI Rock sole size compositions and ridge plot

Northern and Southern rock sole size composition data from 1991 – 2022 for the Aleutian Islands, with Ridge plot from ggridges.

dat <- RODBC::sqlQuery(channel = channel, 
                       query = 
"WITH FILTERED_STRATA AS (
SELECT 
AREA_ID, 
DESCRIPTION 
FROM GAP_PRODUCTS.AREA
WHERE TYPE = 'REGION' 
AND SURVEY_DEFINITION_ID = 52)
SELECT 
LENGTH_MM, 
YEAR
FROM GAP_PRODUCTS.SIZECOMP SIZECOMP
JOIN FILTERED_STRATA STRATA 
ON STRATA.AREA_ID = SIZECOMP.AREA_ID
WHERE SIZECOMP.SURVEY_DEFINITION_ID IN 52 
AND SIZECOMP.SPECIES_CODE IN (10261, 10262)")
dat0 <- dat %>% 
  janitor::clean_names() %>% 
  dplyr::mutate(length_cm = length_mm/10)

# install.packages("ggridges")
library(ggridges)
figure <- 
  ggplot2::ggplot(
    data = dat0, 
    mapping = aes(x = length_cm, y = as.factor(year), fill = stat(x))) +
  ggridges::theme_ridges(center_axis_labels = TRUE) + 
  ggridges::geom_density_ridges_gradient(scale = 4, show.legend = FALSE) + 
  ggplot2::scale_y_discrete(name = "Year", expand = c(0.01, 0)) +
  ggplot2::scale_x_continuous(name = "Length (cm)", expand = c(0.01, 0)) +
  # ggplot2::scale_fill_grey() +
  ggplot2::labs(title = 'AI Rock sole Size Compositions 1991 – 2022') 

flextable::flextable(head(dat))

LENGTH_MM

YEAR

110

1,997

130

1,997

140

1,997

150

1,997

160

1,997

170

1,997

figure

Ex. 3: EBS Walleye Pollock Age Compositions and Age Pyramid

Walleye pollock age composition for the EBS Standard Area from 1982 – 2022 and the EBS + NW Area from 1987 – 2022, with age pyramid plot.

dat <- RODBC::sqlQuery(channel = channel, 
                       query = 
"WITH FILTERED_STRATA AS (
SELECT 
AREA_ID, 
DESCRIPTION 
FROM GAP_PRODUCTS.AREA
WHERE TYPE = 'REGION' AND 
SURVEY_DEFINITION_ID = 98)
SELECT 
AGECOMP.AGE, 
AGECOMP.POPULATION_COUNT, 
AGECOMP.SEX
FROM GAP_PRODUCTS.AGECOMP AGECOMP
JOIN FILTERED_STRATA STRATA 
ON STRATA.AREA_ID = AGECOMP.AREA_ID
WHERE SURVEY_DEFINITION_ID = 98 
AND SPECIES_CODE = 21740
AND AGE >= 0")
dat0 <- dat %>% 
  janitor::clean_names() %>% 
  dplyr::filter(sex %in% c(1,2)) %>%
  dplyr::mutate(
    sex = ifelse(sex == 1, "M", "F"),
    population_count = # change male population to negative
      ifelse(sex=="M", population_count*(-1), population_count*1)/1e9) 

figure <- ggplot2::ggplot(
  data = dat0, 
  mapping = 
                 aes(x = age,
                     y = population_count, 
                     fill = sex)) +
  ggplot2::scale_fill_grey() +
  ggplot2::geom_bar(stat = "identity") +
  ggplot2::coord_flip() +
  ggplot2::scale_x_continuous(name = "Age") +
  ggplot2::scale_y_continuous(name = "Population (billions)", labels = abs) +
  ggplot2::ggtitle(label = "EBS Walleye Pollock Age Compositions 1982 – 2022")  + 
  ggplot2::guides(fill = guide_legend(title = "Sex"))+
  ggplot2::theme_bw()

flextable::flextable(head(dat))

AGE

POPULATION_COUNT

SEX

1

137,398,782

1

2

1,040,787,296

1

3

1,183,060,368

1

4

1,661,729,359

1

5

507,355,476

1

6

70,450,613

1

figure

Ex. 4: NBS Pacific cod biomass and abundance

Pacific cod biomass and abundance data for the NBS by stratum.

dat <- RODBC::sqlQuery(channel = channel, 
                       query = 
"WITH FILTERED_STRATA AS (
SELECT 
AREA_ID, 
AREA_NAME, 
DESCRIPTION 
FROM GAP_PRODUCTS.AREA
WHERE TYPE in ('STRATUM') AND 
SURVEY_DEFINITION_ID = 143) 
SELECT 
BIOMASS.BIOMASS_MT, 
BIOMASS.POPULATION_COUNT, 
BIOMASS.YEAR, 
STRATA.AREA_NAME
FROM GAP_PRODUCTS.BIOMASS BIOMASS 
JOIN FILTERED_STRATA STRATA 
ON STRATA.AREA_ID = BIOMASS.AREA_ID
WHERE BIOMASS.SURVEY_DEFINITION_ID IN 143 
AND BIOMASS.SPECIES_CODE = 21720")
dat0 <- dat %>% 
  janitor::clean_names() %>% 
  dplyr::select(biomass_mt, population_count, year, area = area_name) %>%
  pivot_longer(cols = c("biomass_mt", "population_count"), 
               names_to = "var", 
               values_to = "val") %>% 
  dplyr::mutate(
    val = ifelse(var == "biomass_mt", val/1e6, val/1e9), 
    var = ifelse(var == "biomass_mt", "Biomass (Mmt)", "Population (B)"), 
    area = factor(area, levels = unique(area), labels = unique(area), ordered = TRUE))

figure <- ggplot2::ggplot(
  dat = dat0, 
  mapping = aes(y = val, x = year, fill = area))  + 
  ggplot2::geom_bar(position="stack", stat="identity") +  
  ggplot2::facet_grid(rows = vars(var), scales = "free_y") +
  ggplot2::scale_y_continuous(name = "Estimate", labels = comma) +
  ggplot2::scale_x_continuous(name = "Year", breaks = unique(dat0$year)) +
  ggplot2::labs(title = 'NBS Pacific cod biomass and abundance by stratum')  + 
  ggplot2::guides(fill=guide_legend(title = "Region Type"))+
  ggplot2::scale_fill_grey() +
  ggplot2::theme_bw() +
  ggplot2::theme(legend.direction = "horizontal", 
                 legend.position = "bottom")

flextable::flextable(head(dat))

BIOMASS_MT

POPULATION_COUNT

YEAR

AREA_NAME

7,462.5586

4,724,153.5

2,010

Inner Domain

7,462.5586

4,724,153.5

2,010

Inner Domain

7,462.5586

4,724,153.5

2,010

Inner Domain

7,462.5586

4,724,153.5

2,010

Inner Domain

7,462.5586

4,724,153.5

2,010

Inner Domain

680.4357

250,836.5

2,010

Middle Domain

figure

Ex. 5: GOA Pacific Ocean perch biomass and line plot

Pacific Ocean perch biomass totals for GOA between 1984-2021 from GAP_PRODUCTS.BIOMASS

dat <- RODBC::sqlQuery(channel = channel, 
                       query = 
"SELECT 
SURVEY_DEFINITION_ID, 
BIOMASS_MT, 
YEAR
FROM GAP_PRODUCTS.BIOMASS
WHERE SPECIES_CODE = 30060 
AND SURVEY_DEFINITION_ID = 47 
AND AREA_ID = 99903 
AND YEAR BETWEEN 1984 AND 2021;") %>% 
  janitor::clean_names() %>% 
  dplyr::mutate(biomass_mt = biomass_mt/1000)
a_mean <- dat %>% 
  dplyr::group_by(survey_definition_id) %>% 
  dplyr::summarise(biomass_mt = mean(biomass_mt, na.rm = TRUE), 
                   minyr = min(year, na.rm = TRUE), 
                   maxyr = max(year, na.rm = TRUE)) 

figure <-
  ggplot(data = dat, 
         mapping = aes(x = year, 
                       y = biomass_mt)) +
  ggplot2::geom_point(size = 2.5, color = "grey40") + 
  ggplot2::scale_x_continuous(
    name = "Year", 
    labels = scales::label_number(
      accuracy = 1, 
      big.mark = ""))   +
  ggplot2::scale_y_continuous(
    name = "Biomass (Kmt)", 
    labels = comma) +
  ggplot2::geom_segment(
    data = a_mean,
    mapping = aes(x = minyr, 
                  xend = maxyr, 
                  y = biomass_mt, 
                  yend = biomass_mt),
    linetype = "dashed", 
    size = 2) +
  ggplot2::ggtitle(
    label = "GOA Pacific Ocean Perch Biomass 1984-2021", 
    subtitle = paste0("Mean = ", 
                      formatC(x = a_mean$biomass_mt, 
                              digits = 2, 
                              big.mark = ",", 
                              format = "f"), 
                      " Kmt")) +
  ggplot2::theme_bw()

flextable::flextable(head(dat))

survey_definition_id

biomass_mt

year

47

220.9105

1,984

47

241.4382

1,987

47

157.2951

1,990

47

483.6226

1,993

47

771.4128

1,996

47

727.0635

1,999

figure
Plot of POP biomass totals for GOA between 1984-2021 from `GAP_PRODUCTS.BIOMASS`.

Figure 1: Plot of POP biomass totals for GOA between 1984-2021 from GAP_PRODUCTS.BIOMASS.

Ex. 6: EBS Pacific Ocean perch CPUE and akgfmaps map

Pacific Ocean perch catch-per-unit-effort estimates for EBS in 2021 from GAP_PRODUCTS.CPUE and map constructed using akgfmaps. Here, we’ll use AKFIN HAUL and CRUISES data also included in this repo, for convenience, though they are very similar to their RACEBASE analogs.

dat <- RODBC::sqlQuery(channel = channel, 
                       query = 
"SELECT 
(cp.CPUE_KGKM2/100) CPUE_KGHA, -- akgfmaps is expecting hectares
hh.LATITUDE_DD_START LATITUDE,
hh.LONGITUDE_DD_START LONGITUDE

FROM GAP_PRODUCTS.CPUE cp

-- Use HAUL data to obtain LATITUDE & LONGITUDE and connect to cruisejoin
LEFT JOIN GAP_PRODUCTS.AKFIN_HAUL hh
ON cp.HAULJOIN = hh.HAULJOIN

-- Use CRUISES data to obtain YEAR and SURVEY_DEFINITION_ID
LEFT JOIN GAP_PRODUCTS.AKFIN_CRUISES cc
ON hh.CRUISEJOIN = cc.CRUISEJOIN

WHERE cp.SPECIES_CODE = 30060 
AND cc.SURVEY_DEFINITION_ID = 98 
AND cc.YEAR = 2021;")
# devtools::install_github("afsc-gap-products/akgfmaps", build_vignettes = TRUE)
library(akgfmaps)

figure <- akgfmaps::make_idw_map(
  x = dat, # Pass data as a data frame
  region = "bs.south", # Predefined EBS area
  set.breaks = "jenks", # Gets Jenks breaks from classint::classIntervals()
  in.crs = "+proj=longlat", # Set input coordinate reference system
  out.crs = "EPSG:3338", # Set output coordinate reference system
  grid.cell = c(20000, 20000), # 20x20km grid
  key.title = "Pacific Ocean perch") # Include in the legend title
[inverse distance weighted interpolation]
[inverse distance weighted interpolation]
flextable::flextable(head(dat))

CPUE_KGHA

LATITUDE

LONGITUDE

0

56.66721

-159.7800

0

56.98080

-159.6926

0

56.97832

-159.1519

0

57.31992

-159.0614

0

57.32157

-158.3811

0

57.65189

-158.3673

figure$plot + 
  ggplot2::guides(fill=guide_legend(title = "Pacific Ocean perch\nCPUE (kg/km2)"))  |>   
  change_fill_color(new.scheme = "grey", show.plot = FALSE)

footer.knit

NOAA README

This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project code is provided on an ‘as is’ basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.

NOAA License

Software code created by U.S. Government employees is not subject to copyright in the United States (17 U.S.C. §105). The United States/Department of Commerce reserve all rights to seek and obtain copyright protection in countries other than the United States for Software authored in its entirety by the Department of Commerce. To this end, the Department of Commerce hereby grants to Recipient a royalty-free, nonexclusive license to use, copy, and create derivative works of the Software outside of the United States.

NOAA Fisheries

U.S. Department of Commerce | National Oceanographic and Atmospheric Administration | NOAA Fisheries